Timing circuit cad

ABSTRACT

A method of generating a design for timing circuitry having plural rotary travelling wave component circuit sections, comprises the steps of first dividing an area to be serviced into regions each small enough for there to be negligible inter-region transmission-line delay at target operating frequency. The dividing perimeters of each said region are then divided into segments suitable for approximating lumped transmission-line LKR and relevant parameters determined so that time delays over each such segment are substantially equal to cycle time of desired frequency divided by twice the number of segments. The capacitance of each segment is determined to be substantially equal to the largest envisaged load capacitance (including or preferably differential load capacitance) plus loop-to-loop interconnect capacitance plus active device (say and usually transistor) capacitance of voltage-transition regenerative means and addition to unloaded segments of padding capacitance calculated substantially to match the lumped line capacitance, and pitch/width of differencial transmission-line conductors is calculated using Wheeler&#39;s formula constrained by metallization factor involved. Finally a suitable odd number of cross-overs of transmission-line conductors is ascertained to meet cross-talk desiderata and number of transmission line loops specified to cover the area to be serviced and their interconnections, say conveniently at corners of rectangular said regions; and account taken of up to all of interconnect inductance, conduction skin effects, cross-talk, and MOSFET parasitics at least for high frequency applications.

This invention relates to computer-aided design (CAD) for lay-out ofcircuitry for implementing timing of a travelling wave nature and hasapplication, inter alia, to semiconductor integrated circuits (ICs or“chips”), including for very large scale (VLSI) circuits; and tomethods, systems, devices and apparatus arising therefrom and/or byassociation therewith.

Design and layout of conventional clock-trees, perhaps particularly ofH-tree type, for timing or clock signal distribution by CAD gets evermore difficult and problematic with increases in operating clockfrequencies or speeds and in circuitry content, including clock signalbuffering, refreshing and servo-type control as by phase-lock loops, andcomplexity of functional circuitry to be served; all compounded byincreases in capability of fabrication technology further to reduceindividual feature sizes.

It is understood that large H-tree type clock signal distributionlay-outs often require a great deal of detail work after even the mostexpensive of specialist CAD software has produced the best start-pointit can. Indeed, the tendency seems now to be increasingly towardssplitting VLSI circuitry into time domains or zones within whichclock-tree lay-out design should be more manageable. It is, however,further understood that there is a significant incidence ofintractabilities, including such as 25 mis-allocations to such domains,that can cause back-tracking circuit design all the way to basicfloor-plan levels.

Having different clocking domains or zones has led to different parts ofVLSI circuits having different operating clock speeds. This also seemsto have led to VLSI circuits being “rated” by the speed of the fastestclock onboard the chip concerned, though often for only a small part ofits total functional circuitry. Whilst there are must be circumstancesin which VLSI circuits get overall benefit from particular functionsrunning faster than the rest of the chip, it is self-evident that thiscannot be anything like as advantageous and efficient as having thewhole circuit capable of the fastest clock speed, i.e. avoidingeffectively wasting clock cycles for data transfer between time domainsor zones and/or more generally for data-in and data-out provisions.

UK patent number 2 349 524 (deriving from PCT application GB00/00175)relates to timing systems that are radically different from historicalsignal generation with output distributed by such as H-tree lay-outs.Specifically, this patent teaches timing circuitry in which signalgeneration and distribution are effectively integrated by way of pluralrotary travelling wave (RTW) component circuits gridded together in arotation- and phase-locked array with simultaneous production of timingwaveforms at each component circuit and reliably available therefrom inany desired phase or phases.

Suitable such path is of a transmission-line nature and comprises a pairof parallel conductors with a cross-over after the manner of a Moebiusloop, though the cross-over can be replaced by transformer action. Thevoltage transition can be fast as controlled by regenerative mansdistributed along the path, typically of active device usuallytransistor form, say back-to-back parallel pairs of opposed inverters;and a highly square bipolar wave-form can be produced with eachhalf-cycle corresponding to the traverse time of the path. Suchwave-form and its inverse is always available at any desired phase angleaccording to positions along said path of signal take-offs, typicallyfor timing purposes. Any number of such loop paths, often a very largenumber for such as very fast (GHZ and much higher speed) VLSI chips, canbe “gridded” together in arrays, say connected corner-to-corner forconvenient at least nominally rectangular loop geometries (though shapeis not functionally significant), to present stable low-power clockingover much larger areas than that each path loop (which will of course,naturally reduce with speed). We refer to such circuitry as RotaryTravelling Wave Oscillator (RTWO), and more information is available inthe above patent applications, also in the paper “Rotary Traveling-WaveOscillator Arrays: A New Clock Technology” in IEEE Journal ofSolid-State Circuits, vol. 36, no. 11 (November 2001).

In such array of rotary circuit components, whilst the term“distribution” is used, it is found to be helpful to consider such anarray in terms of providing simultaneous availability of reliablysimilar signals in each and every component circuit The duration of eachtiming signal pulse, typically one half-cycle of a bipolar wave-forminherently present differentially, is the traverse time for a signaltransition round endless electromagnetically continuous signal pathsthat by their nature impose a nett signal inversion; and the leading(rising) and trailing (falling) edges of those timing pulses arerepresented by the signal transition and its nett inversion through twotraversals of the endless signal paths; the signal transition beingrefreshed as frequently as desired within such path recirculations, bothas to amplitude and steepness so that power requirements are lowered byenergy supply being simply related to a “top-up” requirements of thesignal transition and having no requirement for absorption of timingsignal energy in the terminations that need such careful detail designin clock-tree type distributions.

As is to be expected, design and lay-out of such integrated generationand distribution perhaps more accurately simultaneous availability) oftravelling-wave type timing signal waveforms is inevitably differentfrom H-tree clock distribution design and lay-out. This has been putforward as needing solution if possible less problematic than forH-trees.

It is an object of this invention to demonstrate practicality of CADlay-out for rotary travelling wave arrays, including use of andcompatibility with industry standard simulation, typically involving thewell known Spice software.

According to general aspects of this invention, viable such CAD layoutinvolves methodology based on predictive calculation fist, followed bysimulation, then correctional adjustments with at least partialre-calculation(s) and re-simulation(s) to refine the predicted layout;and this is done in one general aspect with inductance taken intoparticular account, typically utilising a suitable available computerprogram for its extraction; and in another general aspect relative toattaining better or best results for at least one design parameter ordesideratum, say as to minimising power consumption, and/or keeping toor below a settable target, as can be particularly advantageous.

Accordingly, various aspects of this invention arise in and fromemploying one or more of the following method and/or software steps

-   -   dividing area to be serviced into regions each small enough for        there to be negligible inter-region transmission-line delay at        target operating frequency, such regions conveniently being        rectangular    -   dividing perimeters of each such region into segments suitable        for approximating lumped transmission-line LCR, typically at        least eight such segments, preferably of a rectangular perimeter    -   determining relevant parameters so that time delays over each        such segment are substantially equal to cycle time of desired        frequency divided by twice the number of segments    -   determining capacitance of each segment to be substantially        equal to the largest envisaged load capacitance (including or        preferably differential load capacitance) plus loop-to-loop        interconnect capacitance plus active device (say and usually        transistor) capacitance of voltage-transmission regenerative        means    -   determining addition to unloaded segments of padding capacitance        substantially to match the lumped line capacitance    -   determining inductance for each segment from the lumped line        capacitance    -   determining pitch/width of differential transmission-line        conductors using Wheeler's formula constrained by metallization        factor involved    -   determining suitable odd number of crossovers of        transmission-line conductors to meet cross-talk desiderata    -   specifying number of transmission line loops to cover the area        to be serviced and their interconnections, say conveniently at        corners of rectangular said regions    -   taking account of up to all of interconnect inductance,        conduction skin effects, crosstalk, and MOSFET parasitics at        least for high frequency applications.

Preferred application of design tool methodology software inimplementing and embodying this invention is by way of response,preferably automatic response, to input in the form of a desiredwaveform frequency by generation of a suitable design lay-out, and suchconstitutes another aspect of this invention, preferably along withcompensation for any loading according to input(s) as to location andsize.

Such processing involves solving transmission-line equations for eachsection of the array network, preferably so as to maintain impedance andphase coherence by value(s) and position(s) of added paddingcapacitances and/or by selection of transmission-line geometry. Theresulting lay-out can advantageously be displayed, say on a vdu screenof computer equipment software controlled as herein, preferably completewith showing stimulation of stable and reliable waveform.

In terms of practicality, and compared with industry-standard simulationand re-simulation, this methodology is advantageous for CADimplementation hereof as rotary travelling wave (RTW) timing is found tobe well-suited to predictive calculation that is quicker to perform thanindustry-standard simulation, though such simulation is more accuratethan calculation; and correctional adjustment and recalculation hereofwith re-simulation readily produces convergence to viability satisfyinglayouts.

Moreover, at least one simulation may be other than of industry-standardtype and quicker as RTW timing is also found to lend itself usefully tomuch quicker simulation hereof before use of industry-standardsimulation thus saving time.

For any overall IC design, or for parts of it if so desired, there willbe design parameters and desiderata that are known or are reasonablyestimable relative to timing signal requirements, e.g. target operatingfrequency or clock speed/rate and timing pulse rise/fall times, andloads to be timed as to their number and supply voltage requirements(which could be different for input/output and for logic); together withany relative physical location and proximity requirements of the loads,and their electrical characteristics including capacitive loading theyrepresent, and their tolerance to phase skew; also the intendedimplementing technology typically including as to feature sizing,metallisation, and transistor characteristics. At least ranges canusually readily be produced for such parameters, characteristics anddesiderata, even at quite early design stages, and this invention isapplicable as much to early indication of viability as to laterdetailing of layout. This stage is part of what is hereinafter referredto as “specify”, and it is helpful to have this in a working screendisplay that may further include at least some selected items from thenext paragraph.

There will be implementation parameters that will follow from the targettechnology, often further from the particular foundry to be used. Theremay be ranges to some of these, but others will be fixed, e.g. as tomaximum operating voltage, conductor type (usually copper or aluminium)and thickness, maximum conductor current density, interconnects, andrelating to available passive formations such as resistors andcapacitors and active formations such as transistors.

Parameters and characteristics related to the RTW component circuits andtheir interconnection into an array are also readily quantifiable,including from what is in our aforementioned UK patent and also in thepaper published in Vol 36, No. 11 (November 2001) of IEEE Journal ofSolid State Circuits, the contents of which are to be taken as fullyimported herein. These RTW component circuits are specifically envisagedwith their travelling wave supporting transmission-line endless signalpaths implemented using two essentially parallel conductive traces witha nett inversion effect.

Such effect is available from a “cross-over” between such traceconductors, or an odd number of cross-overs. Whilst there is no inherentdictation of shape(s) for such signal path(s), their localisedinterconnects in forming a simultaneously operating synchronised arrayhave led rationally to same being at corners of substantiallyrectangular RTW component circuits with “virtual” such circuits apparentand effective in “holes” between actual such circuits.

Alternative operative areal geometries that are also very versatile aretaught herein and form a specific inventive aspect hereof specifically,substantially rectangular geometries involving using layers ofmetallisation in orthogonal row-and 30 column fashion, one layer forrow-following conductors, and the other for column following conductors,have vias at intersections for interconnects to secure the requisiteMoebius twist configuration, thus the signal inversion. The rectangularaspect ratio can be chosen to make greater use, thus occupation, of onemetallisation layer than the other, which can be advantageous as manyother logic etc connects must be made. Specific implementation of thepresent invention will be described relative to such inventive geometry,but that is not to be taken as limiting. It is a matter of choice andconvenience to adopt a particular geometry, including as to applying thesame geometry to all such arrayed RTW component circuits.

Further constraint(s) may be applied, typically as to permissible widthsand spacings of dual parallel conductor traces, and how much of aparticular metallisation layer can be devoted to timing signal usage.Whilst, in principle, it is immaterial as to whether this or otherconstraints is treated as part of the “specify” stage, in practice it isadvantageous to have this available on a “constain” screen that mightalso usefully include or repeat access to some of the “specify” data, atleast as data that might be changed independently of other “specify”data, say for ready adjustment purposes at least after simulation, butwith general convenience of availability. The balance and/or overlapbetween “specify” and “constrain” screens is a matter of choice. Thesespecified parameters/characteristics and constraints etc constitute acontextual database within which CAD layout design aspects andembodiments of this invention operate.

An important factor to note and emphasise is that reduction of featuresizes in fabrication of ICs and high operating frequencies or clockspeeds work together in increasing the impact of inductance, and this istaken into account in a general aspect of implementing this invention.In pursuit of an alternative to quite slow processing using the wellknown FastHenry inductance extraction program for simulating inductancebetween the component rotary travelling wave circuits and IC signalconductors, a useful approximating routine has been developed.

This routine is seen as a specific aspect of this invention and is basedon use of two-dimensional gridding of area about a component circuitconductor and calculation of mutual inductance per unit length, say fora no-thickness wire in the X and Y co-ordinate directions at each gridpoint, more accurately plural such wires in parallel; and integratingalong a general wire in the same space for an approximation to mutualinductance that is typically within 15% of using FastHenry, and verymuch faster to do as a useful measure of likelihood of viability withoutspecifying or constraining significant changes.

It is preferred herein to select a signal paths geometry that is helpfulto implementing this CAD layout design invention, and regular geometrydoes assist in this way. Once such a regular geometry is decided,whether as a convenient standard or chosen from a repertoire thereof,and its parameters/characteristics are, known along with those of the ICimplementation concerned, including as to transistors etc ofregenerative cross-connections at intervals along the dual conductivetraces of each RTW component circuits, a length for the signal paths isreadily calculable as a function of the target frequency.

From such length and where such RTW signal paths serve sub-areas of thechip that are contiguous, or effectively so, a prediction for acorresponding number of RTW component circuits can be a simple functionof such sub-areas related to the calculated signal path lengths and thetotal operative area to be serviced for the IC concerned. For regularsubstantially rectangular signal path geometries with their sidesadjacent as in the above-mentioned cofiled application, their includedarea is a function of their aspect ratio, and the predicted requisitenumber of RTW component circuits follows most readily from the lengthsof their sides. Ready derivation of such predicted number of RTWcomponent circuits can also be in accordance with side lengths forcorner-connected actual and virtual RTW component circuits, but see morelater regarding clustering.

Such predicted number of RTW component circuits need not be treated asan absolute, as geometry alone can change it in relation to a nominalsignal path length, and changing electrical parameters can affectnominal path length.

Alongside “specify” and/or “constrain” stages, routines are provided forthe loads to be grouped or clustered, and then to be routed to connectwith the RTW component circuits. These routines are seen as specificinventive aspects hereof

Clustering is needed because there will usually be a much larger numberof individual loads to be supplied with timing signals than there areRTW component circuits. This might be seen as problematic in the contextof the industry norm of aiming for synchronous application of timingsignals, and each RTW component circuit inherently having only oneposition along its path from which to obtain any exact particular phase.However, provisions can be made for single timing signal take-offpositions from each RTW component circuit to feed into branches or smallnetworks for multiple connections to plural loads.

Moreover, routing of connects for the loads is aided by appreciationthat loads as functional circuitry of ICs generally have a tolerance toskew, i.e. the extent to which a timing signal can be off its exactnominal phase and still work correctly. This tolerance translates into alength along the signal path, i.e. the conductor traces, of the RTWcomponent circuit concerned. Given that a full cycle of the travellingwave in such signal path requires two signal transition traversals ofthe path (one traversal corresponding to one polarity of pulse inpreferred bipolar waveforms), a skew tolerance of up to 10% (which isnot unusual) allows up to 20% of the signal path to be used for timingsignal takeoffs without loss of nominally synchronous operation. Atleast when combined with use of small networks, this generally thisgives a possible take-offs multiplier that is more than adequate toservice the loads of even ICs with a very large number of loads toreceive timing signals, such as a microprocessor, and to keep take-offconnections short enough to avoid “ringing” reflections during the risetime (thus avoid need for provision of buffers), which is readilycalculable as an RTW component circuit feature, say typically less thanabout 0.5 millimetre for 2 GHz clocking, and less for higher clockrates.

Furthermore, the resulting potentially highly asymmetric loading of thesignal paths of the RTW component circuits is readily compensated byspecifying dummy or padding capacitances applied elsewhere round eachsuch signal path concerned, and this may also form part of preferredclustering routines, i.e. after load allocation to the RTW componentcircuits thus capacitive loading known. This is also seen as a specificinventive aspect, including as may be done on an earlier predictivecalculation basis from mean or maximum loading per RTW component circuitand assuming whatever dummy or padding capacitance corresponds perside-length thereof then deduction (maybe addition relative to mean) foreach actual load assigned, or as may be done by use of physicallyin-built and specifiable capacitance provisions at each said side thenduly specifying according to assigned loads so as to achieve either ofsubstantially uniform impedance loading all round each RTW componentcircuits endless signal path or a viable progression of such impedanceloading, i.e. that maintains desired signal fidelity at least wheretake-off connects are required.

It is, of course; the case that any move away from the fully synchronousparadigm towards multi-phase operation or even progressive phasing ofoperation of functional circuitry, say following the order of functionsperformed, would be very readily accommodated by RTW clocking.

A clustering routine, as a specific inventive aspect hereof can be basedon what loading can be driven by each RTW component circuit, say as anaverage therefor with a reasonable safety margin built-in relative toany maxima, which also aids achieving a reasonably even spread of loadscapacitance between the RTW component circuits as is beneficial tooverall operation. Thus, preferred clustering routines may operate by afirst decision as to the number of clusters, say on the basis that thereshould be more than the result of dividing the total for all capacitiveloads (which should be available for any proposed IC at least on anot-more-than basis) by the maximum loads capacitance to be allowed foreach cluster (which will be available as a feature of the rotarytravelling wave component circuits). A first assignment of loads toclusters could be simply as one per RTW component circuit available, andcan take account of what is known for each load about requirements forphysical location on the IC chip and its skew tolerance, otherwise aimedat reasonably even spread of capacitance amongst/between the clusters.However, exemplary clustering routines hereof (as will shown anddescribed) can be free of criticality regarding first assignment, i.e.will sort out to practical assigning as they progress.

Thus first assignment can be processed cluster-by-cluster to determinetheir centroids in terms of capacitance weighted averaging of X-Ycoordinates and phase tolerances of loads in a cluster relative to thetotal capacitance of that cluster, then processed load-by-load tocalculate distances to each cluster centroid and, if and as appropriate,shift any load to the cluster having the nearest of any nearercentroids. At each such shift, the centroids of the clusters concernedwill be recalculated, and these steps iterated until either no loads aremoved or a pre-set maximum is reached for iterations. Using knowndistance functions, such routine has proved successful for 40,000 loadsand 100 clusters on a Pentium 600 MHz computer, and it is feasible tosplit larger numbers of loads (and clusters) into sectors grouped byphysical location criteria.

Preferably, and advantageously such clustering takes account of anaverage for load capacitance as clustered in order to even out theloading of the clusters at least to some useful effect.

Inventive routing routines hereof take the results of clustering to theRTW component circuits and can take further account of skew tolerancesof the loads to plot actual connects within the available “skewtolerance” length of the RTW signal paths concerned, whether directly orvia networks to suit the skew tolerances. Given that industry-standardSpice-based simulations of entire arrays hereof must inevitably be slow,another specific inventive aspect hereof is directed to a routine formore easily and quickly first checking/verification of proposed RTWtiming signal arrays. Highly advantageously, this leads to similarlyquick and easy remedial measures for arrays that are unsatisfactory atsuch first checking, or can be significantly improved; and constitutes astage herein called “solve”.

Our above-mentioned UK patent put forward a rule for each junction to inand between component circuits in RTW arrays to have equality of energyinto and out of that junction, with consequential power and impedanceimplications. This further aspect of this invention develops thatteaching to checking and making adjustment to achieve substantialequality of impedances at each junction along with doing likewise as totravelling wave traverse times round each RTW component circuit signalpath having an integer multiple relationship with the clock frequencyperiod. Preferred implementation is by way of a data-base for thestructure of the RTW component circuits together with built-in testfunctions for verifying and modifying to improve a layout as tested.

To this end, suitable simulation alternative to industry-standardsimulations uses data for nodes (or junctions) and paths (or lines)connecting the nodes. For the dual parallel transmission lineconductors, corresponding paths are paired between theirinterconnections (nodes), and share a mutual inductance. Connects fortiming signal take-off to loads constitute other paths making nodes withthe paired lines or paths. Each node has an associated prescribed timingsignal phase, while each path has associated capacitance and inductance.At least the paired paths also have mutual capacitance and mutualinductance, and the direction of travel of the timing signal can betaken into account.

Suitable verification involves calculating the impedances of the pathsand node-by-node summation as to incoming and outgoing signal flowsbeing equal thus in cancelling relationship, otherwise modification isapplied; and calculating the time 10 delays path-by-path as to matching(at the clock frequency concerned) relative to the timing signal phasesat the nodes connected by the path concerned, otherwise othermodification is applied.

Suitable modification involves making changes to the inductance anchorcapacitance of the path concerned, and impedance and time delay can bechanged independently using built-in data manipulation functionsoperative to find matches by change to one or more items concerned. Thisis readily done for time delays as the reference is always the sameknown value. For impedance mismatches, effective operation is achievedby increasing impedance along the path from the higher to the lower ofthe impedances concerned, which can be viewed as surplus and deficit ofimpedance, respectively. Preferably, this is done after pairing up nodesthat have impedance, respectively. Preferably, this is done afterpairing up nodes that have impedance errors that are substantially equalbut opposite, then grouping for two or more to cancel one.

The “solve” stage will produce a calculated parametric detailing for theRTW component circuits, both as to conductive trace sizing and spacingwithin what has been specified and coinstrained and as to activecross-connection between the traces.

It can be useful to allow expert inspection of such detail, and tofurther allow changes to be made if adjudged to be necessary oradvisable, whereupon any corresponding adjustments can be madeautomatically to “specify” and “constrain”, and the “solve” stagerepeated. This stage conveniently shows the cross-connection circuitdiagram with parameters marked or available at a mouse-click, likewisethe cross-connection circuitry at a respective position, and is called“circuits” herein.

If there has been indication of non-viability at any calculation stage,this is preferably, and often quite readily, accompanied by indicationof which of the parameters could usefully be changed, advantageouslysuggest at least the sense of worthwhile change, say up or down as tovalue. Consequential changes will, of course, result in anotheriteration of the calculation phase, i.e. up to and including “solve”.

After the “solve” stage as such, or as following any inputs in a“circuits” stage, the calculation phase is complete, and simulation canbe done in accordance with industry-standard techniques, typically atpresent by application of Spice software. As is well-known, thisrequires heavy-duty data processing, which can take a long time on eventhe most powerful of currently available co-called personal computers(PCs) or even server-class microprocessors. Alongside developing the CADlayout software hereof; a powerful “engine” has been designed and builtto speed up and generally facilitate processing Spice-type simulation,specifically as plural microprocessors interconnected and programmed toperform parallel processing on the data concerned, and same is seen asanother specific inventive aspect hereof.

The results of simulation can conveniently be presented as a schematicoutline layout of the component rotary travelling wave circuits withwhatever may be desired by way of indication of parameters, andpreferably further with capability to select any position for showingthe tiring signal waveform as present thereat, which is seen asrepresenting a further inventive aspect hereof.

This waveform inspection may result in certain parts of the transmissionlines of at least some of the component rotary travelling wave circuitbeing deemed as not suitable for making timing signal take-offconnections. It is advantageous for this waveform assessment procedureto be automated, at least to some useful extent say by stipulating a“worst-case” for waveform acceptability, and identifying, marking up andspecifying lengths of transmission lines that do not measure up and arenot to be used for timing signal take-off load connections, and this isalso seen as a specific inventive aspect. Perhaps more usefully,however, resort may be had to adjustment of aforesaid padding or dummycapacitances as far as can alleviate any problems at least as toaffecting where timing signal take-oafs are convenient

The “simulation” phase will produce results more accurately thanreasonably to be expected of the calculation phase (in which thechecking/verification is intended to be included, of course). Insofar assuch discrepancies mean that the simulated performance is outsidespecified target performance, adjustments can be made with furtheriteration(s) of the calculation phase as relevant thereto, with samefollowed by another “simulation” stage or stages, advantageously atleast to some extent as a matter of user choice.

Reverting to one of the general aspects of this invention, namelypreferred context of power usage minimisation, or meeting a chosen andspecified target therefor, as an overall calculation constraint, it is afurther specific inventive aspect that change to power usage indicationsbe both permitted, preferably at least at some stages and furtherpreferably at any time, whereupon there will be automatic recalculation.

Indeed, another specific inventive aspect that can have related utilityis seen in “tear-off” type availability of any desired parametric oroperational information at any stage and on any screen related thereto,whether further for immediate capability to change or automatic transferof or to part or all of the stage screen concerned.

The final stage will be to go from acceptable simulation results to“layout” as such.

Exemplary specific implementation for this invention is shown in anddescribed relative to the accompanying diagrammatic drawings, in which:

FIG. 1 shows an array of interconnected transmission-line loops forproducing bipolar differential wave-forms;

FIG. 2 is a circuit diagram for one such loop;

FIG. 3 is an idealize diagram including regeneration provision;

FIG. 4 shows screen display for four interconnected loops andauto-generator menu;

FIG. 5 shows a rig useful cross-talk assessment;

FIG. 6 shows screen display for skew analysis;

FIG. 7 shows screen display for jitter analysis;

FIG. 8 shows a fragment of IC functional circuitry to receive timingsignals;

FIG. 9 develops FIG. 8 to show clustering of timing signal loads;

FIG. 10 is a flow chart for a clustering algorithm or routine;

FIG. 11 shows rectangular grid geometry for RTW component circuits;

FIG. 12 shows clustered loads of connected to RTW circuitry of FIG. 10;

FIG. 13 is a flow chart for an inductance extraction algorithm orroutine;

FIG. 14 is a flow chart for a design and verification algorithm orroutine;

FIGS. 15 and 15A–E are block outline and further features of a parallelsimulation processor;

FIG. 16 shows successive functions for one embodiment of softwarehereof,

FIG. 17 is a program-style flow chart for such software;

FIG. 18 is a general overview of such software;

FIG. 19 shows outline screen content and FIGS. 19A–F variant details;

FIG. 20 is a basic transmission line cross-connecting circuit diagram;

FIG. 21 shows a cross-connection circuit specificable in variousrespects;

FIG. 22 shows a circuit for specifiable dummy or padding capacitance;

FIG. 23 shows an RTW array about free space and its bounding, and

FIG. 24 shows rectangular gridding with interconnection features.

Rotary Travelling-Wave Oscillator (RTWO) arrays, see FIG. 1, can providea conceptually simple solution to timing signal (clock) generation anddistribution problems or low-submicron integrated circuits. Asreplacement for clock trees, PLLs and DLLs it offers a solution believedto be readily scalable up to multo-GHz frequencies. The RTWO conceptrelies on inductance to give stable clock generation with multiple-phasecapability.

As full ramifications of inductance extraction are still relativelyunfamiliar conceptually to most VLSO digital circuit designers, it hasbeen accepted that application of RTWO to clocking (Rotary Clocking) inthe VLSI market, CAD support should, probably must, be aided byproviding CAD tool teaching to de-skill the Rotary Clock design process.The aim is transparent calculation of significant electromagnetic and RFeffects present in a target clock design, and reflection of these duringsimulation and design iteration. It is envisaged that a final output inGDSH or Gerber format could make the methodology and related softwarefunctionally equivalent to existing H-tree generation tools.

The RTWO concept involves operation not by resonance, but by thegeneration and maintaining of a rotating voltage transition in anendless differential electromagnetic path. A twist (or odd number oftwists) in the path forces phase inversion during rotation, so thatthere is effective oscillation as represented in the resultingwave-form. Power consumption is low because of inherent energy recyclingaction, thus requiring only top-up energy by its regenerativeprovisions, see back-to-back diode pairs between loop transmission-lineconductors in FIG. 2 and more circuit and idealized detail in FIG. 3.For a sharp voltage transition, the rotary action produces highly squarewaves directly. The regenerative circuitry is shown employingtransistors to initiate and maintain the voltage transition and itsrotation, thus availability of oscillation wave-form output, and to aidin providing rotation lock. Arrays of interconnected rings convenientlyfabricated usually mainly on top layer metallisation act as anadvantageous substitute for a conventional clock H-tee. Such an array isinherently phase-locked and can cover an arbitrary size. Taps can bemade to take off local clock signals as required. All phases of theclock are available simultaneously, see marked for 45-degree positions.

Superficially, the basic topology may look like a ring oscillator, butoperation is fundamentally different. Capacitance from clock loadingbecomes part of the transmission-line mechanism and energy isrecirculated within the structures as the voltage transition rotates.

In principle, the perceived problems of VLSI clock generation amount tothe generation and distribution of a high frequency clock signal over alarge chip at high frequency and with controlled skew, jitter for fastedge rates. In practice, the following arise for resolution,particularly during design:

-   -   minimising skew between clock signals over the active area of        chip as caused by variable load capacitances    -   controlling edge rates with lossy interconnects    -   mitigating the effects of variability of active components    -   handling transmission-line effects at high frequency including        return-current paths and inductance    -   minimising power consumption and synchronous supply surging    -   coping with the effects of induced noise from the clock to other        signal lines.

The methodology chosen involves defining the clocking interconnect priorto cell placement This is advantageous from an electromagneticsperspective and is not new in itself (having been used by such as IBMfor its S/390 processor clocking), though post insertion is also seen asfeasible for this invention.

Rotary clocking networks are subject to a different set of designconstraints compared with conventional clock H-trees, at least in thefollowing respects:

-   -   RTWO lines are never terminated    -   capacitive loading is readily tolerated by designing    -   differential RTWO action gives a well defined go-and-return        current path However, issues arising from noise-coupling can        still be problematic due to the high circulating currents        involved. Analysis of a typical section of RTWO differential        transmission line with (see FIG. 4) underlying metal traces and        a victim trace used to simulate crosstalk, and related test        results, have been convincing as it being sufficiently accurate        to represent RTWO transmission-lines as a series connection of        lumped elements. When parameters of drive transistors and        parasitic coupling terms are added, a short-section of the        transmission-line can be modelled as in FIG. 3 (typical values        shown).

The model circuit shows the most significant terms, i.e. thetransmission-line inductance, series resistance, interconnectcapacitance, clock signal load capacitance and transistor capacitance.It is to be noted that ACO represents an AC ground point (VDD or VSS).Transistor characteristics have only 2nd-order capacitive effects on thetiming since they are operating in a transmission-line amplifier mode.

Equations governing the circuit include

Differential inductance (per unit length):$L_{perlen} = {\left( \frac{\mu_{o}}{\pi} \right)\log\;\left\{ {\left( \frac{\pi.s}{w + t_{c}} \right) + 1} \right\}}$Impedance of a segment $Z_{0}:=\sqrt{\frac{L_{lump}}{C_{lump}}}$ TimeDelay over a segment $t_{d}:=\sqrt{L_{lump} \cdot C_{lump}}$ OverallOperating frequency$f_{osc} = \frac{1}{2\sqrt{L_{total} \cdot C_{total}}}$

Additional constraints on the RTWO system are:

-   -   signal inversion must occur on all (or most) closed paths    -   impedance should match at all junctions    -   signals should arrive simultaneously at junctions.

Convenient implementation software could have a GUI written in Tcl/Tk.The syntax of Tcl is very simple, which would help for users withlimited programming experience. Tcl/Tk also has robust cross-platformsupport C and C++ can be used where required for speed

FIG. 5 shows a main design screen having a large canvas view of theclock design, preferably scaled directly from the custom physical layoutdatabase, a menu system, and an area for project notes.

The sidebar also houses the entry box that allows a user to enter adesired clock frequency, from which the software will look to generate asuitable clock design. There is also provision to compensate for anyload on the clock. By a simple point & click, the user can specify thelocation and size of any clock load.

The processing hereof then solves the transmission-line equations foreach section of the RTWO network. It maintains impedance and phasecoherence by adjusting ‘padding’ capacitances (implemented with MOScapacitors), and adjusting the transmission-line geometry. Using all ofthe information available to it, the methodology and software hereofwill estimate a viable, maybe ideal, physical layout to achieve a givenfrequency with a stable and reliable clock waveform, and display it onthe screen.

From the set of lump-capacitance loads representing local clock stubs orbuffers, the desired frequency and maximum metallisation utilisationlimit, an internal layout database of closed-loop paths is calculated,impedance matched at junctions, and rotation-related phase inversionassured.

The basic design generation procedure is

-   -   divide area to be serviced into rectangular regions each small        enough for there to be negligible inter-region transmission-line        delay at target operating frequency.    -   divide perimeters of each such region into at least 8 segments        suitable for approximating lumped transmission-line LCR    -   determine parameters for time delays over each such segment to        be nominally equal to cycle time of desired frequency divided by        16    -   determine capacitance of each segment to nominally equal the sum        of the largest envisaged differential load capacitance,        loop-to-loop interconnect capacitance and active transistor        capacitance    -   determining addition to unloaded segments of padding capacitance        substantially to match the lumped line capacitance    -   determine inductance for each segment from the lumped line        capacitance    -   determine pitch/width of differential transmission-line        conductors using Wheeler's formula constrained by metallisation        factor involved    -   determine suitable odd number of cross-overs of        transmission-line conductors to meet cross-talk desiderata    -   specify number of transmission line loops to cover the area to        be serviced and their interconnections.

Verification can readily be by running a modified version of theindustry standard Spice simulation tool on the design. This simulationincludes the Spice LCR models and Mosfets, as well as electromagneticsimulation results of multiple 4-port subcircuits by FastHenry andFastCap.

As RTWO architectures stabilise quickly, most simulations will yieldmeaningful results quickly, say within 30 seconds of initialization.This allows the methodology and software to refine the design, byiterating a number of times and making progressively smaller changes tothe layout to achieve the desired frequency. This entire process isuser-configurable, from the command line used to start Spice, to themaximum number of iterations allowed. Most designs should take only ashort time, say less than 5 minutes to achieve the required accuracy,with final pre-production processing no more than a few hours.

The electromagnetic simulation interface merits further mention. At Ghzfrequencies, skin effects are evident even m thin metal conductors. Forhighest accuracy, inductance and resistance are calculated usingFastHenry in multi-pole mode. Dividing and segmenting can be fullyautomatic—targeting the current penetration of skin and proximityeffects beyond the 9th harmonic of the clock frequency.

In all but extreme cases, the methodology hereof should output apowerful and robust clock layout, ready for use on the users' own chipdesign.

In some cases, a specific design requirement may require more work, andthe methodology hereof preferably allows users to influence, or evenspecifically lock, certain design variables. In softwareimplementations, by navigating the menu system on the sidebar, users canalter almost all aspects of the design. For example, variables can be“locked”, which will force iteration to use the user-defined values, andattempt to achieve the desired frequency by altering only “unlocked”variables.

Alternatively, the user may invoke a Spice run on the current design, bysimply clicking on “Run Spice”. This is much closer to a traditionaldesign method, with the user entering the design parameters, and thenviewing the results. As Spice runs, raw Spice data is read, and thegraphical representation of the design can be coloured accordingly. Inthis way, it is possible to see the travelling-wave in action,preferably with on-screen display showing the clock frequency at alltimes.

Skew analysis can also be provided, see FIG. 6. This displaysmeasurements from two points on the design (selectable from the maindesign screen). This functions in the same way as a standardoscilloscope, and allows quick evaluation of the clock waveform shape.

Jitter analysis can also be provided, see FIG. 7 for display relevant tothe cycle jitter in presence of simulated power supply noise that can beof user-selectable amplitude and frequency.

Further preferably, provision is made for built-in links to a freelyavailable Spice viewer, SignalFRAN, see FIG. 8. This allows moredetailed measurement of the clock initialisation phase, and can be runsimultaneously.

Outputting results can be by standard GDSII, say by simply selecting therequired item from the menu for generation of a properly formattedoutput file from its internal layout database. Such file could beimmediately ready for importing into the users own design tool. Thelayout may then be subjected to the users usual design checks (DRC/LVSetc.), and re-simulated as a complete design (Spice, or other simulationtool).

It is believed that understanding of the inter-active software embodyingthis invention will be best understood and appreciated from fartheroutlining the context within which it is to operate and the objectivesto be achieved, which starts by reference to FIGS. 8 and 9.

FIG. 8 shows part 10 of an IC layout of its functional circuitry asblocks 11, 11A with interconnects 12 representing logic signal flowwithin each timing signal or clock pulse. Some of the functional circuitblocks, see 11A, require timing signals for gating purposes, typicallyregisters 13 for taking in or outputting data and/or instructionsignals. For immediate purposes of this description, the functionalcircuitry 11 is taken as being substantially fixed physically, i.e. asto location and relative proximities, which is worst-case of what couldbe presented initially for clock layout purposes. This may not excludeall flexibility, e.g. may permit re-location within constraints such asto maximum lengths for the interconnects 12. Whatever latitude ispermitted can be taken into account in the clock layout design hereof,if specified clearly.

There will usually be many more functional circuits 11A requiring timingsignals than there are RTW component circuits of which their endlesstravelling wave signal paths will have only one position for any exacttiming signal phase. FIG. 2 shows grouping of the blocks 11A requiringtiming signals into what are herein called clusters, one shown withcalculated conductive first connects 15A for timing signals to itsregisters 13 respectively, another likewise for first connects 15B. Theregisters 13 constitute timing signal loads. The first connects 15A and15B go to common points 15X and 15Y, respectively, that are calculatedas geometric centres (centroids) for the respective clusters of loads 13served by respective first connects 15A, B. The centroids 15X, Y willhave calculated second conductive connects 16A, B to positions on eachof different RTW component circuits that correspond to the requiredphase of timing signals.

This clustering of FIG. 9 is achieved by the exemplary routine of FIG.10. From entered/extracted data (31), overall total load capacitance andan average loads capacitance for each RTW component circuit arecalculated (32A), and said total divided (32B) by said average. Thetheoretical minimum number of clusters might be little more than suchtotal divided by the maximum loads capacitance that can be driven byeach RTW component circuit, and this could be used as a start point, butwith the practicality of provision for increasing them. It is preferredto use a lower value for driven capacitance, i.e. said average loadscapacitance, which can be set together with a practical margin belowsuch maximum, and contribute usefully towards potential for achievingdesirable even-ness of loading of the RTW component circuits. Ideally,the result, as a number of clusters, should not exceed the maximumnumber of RTW component circuits that could be provided; indeed, canusefully determine a lesser such number. FIG. 10 shows possibility ofclustering-driven increase in the number of RTW component circuits, seedashed at 33, 34.

An initial allocation of loads to clusters (35) can be done in virtuallyany way, even including arbitrarily. One simple algorkhm-driven waycould be related to physical locations and targeted substantially equalnumbers and/or summed capacitances in each cluster, say bearing in mindsaid average capacitance loading calculated for the RTW componentcircuits and any disparities of skew tolerances that can be reallyspread within clusters. If multi-phase timing signals are to be used,say for multi-phase logic or even phase-graded to suit a flow of logicfunctions, this can also be taken into account On the current fullysynchronous, effectively one-phase for-all, paradigm, only one positionof the signal path of each RTW component circuit would be used, so itcan be sensible to take some account of likely locations those positionsof the signal paths, or at least their spacings, say (bearing in mindtypical skew tolerance) actually along a substantial usable part if notmost of one side of each of rectangular such signal paths in anorthogonal grid lay-out for such circuits (see later for routing andFIG. 11).

A simple first allocation algorithm could be according to values of X-and Y-coordinates being within pre-set differences or ranges (see morebelow), and clustered load capacitances not exceeding said average, thenallocating left-over loads to the cluster containing the X-Y nearestload, but might be even simpler, even arbitrary, as this firstallocation is normally non-critical in view of what is achievable usingthe following steps of a preferred clustering routine, and as preferredgeometries of the signal paths of the RTW component circuits effectivelyinherently militate against any load connects being longer than amaximum therefor that avoids “ringing” effects due to unwantedreflections.

These steps comprise the repeated step (36) of calculating the centroidsof each cluster, and a repeating loop of calculating the distances ofeach load to the nearest cluster (38), then moving each load to thenearest cluster (39). Whatever new load-to-cluster allocations arise arefed back (41) to the clustering step (35); and the cluster centroidscalculation (35) and load distances/movement etc steps (36–41) arerepeated until no loads get moved, or a maximum iteration count isreached.

A distance metric should be chosen that will give acceptableconvergence. Suitable such metrics within the mathematical competence ofthe inventors, but not in any way intended to be limiting, include(A) (Xc−XL)²+(Yc−Y_(L))²+k.F(|Pc−P_(L)|)+c.G(C_(L)+C_(L))(B){(Xc−XL)2+(Ye YL)2+k.F( )PC−PLI))*G(CC+CL)

where subscripts “C” and “L” denote “cluster” and “load”

X, Y are usual Cartesian co-ordinate distances

P, C are phase and capacitance

k, c are user-defined skew tolerance and group capacitance scalingfactors, e.g.

k=(required cluster size)²/F(phase skew tolerance)

c=(required cluster size)²/G(max total capacitance per group)

F, G are positive monotonically increasing mapping functions,

the aim being to increase rapidly when the arguments reach cut-offvalues for maximal total capacitance or phase tolerance. A good staringpoint has been found to be F(x)=G(x)=x², and 40,000 loads in 100clusters have been processed using a 600 MHz Pentium with ease.

Of course, if there was any risk of over-long load connects, there couldbe a check against a pre-set maximum connect length, say as a firststage of moving loads between clusters, which is conveniently includedin the clustering step 35, say with a margin in view of routing likelyto result in paths to which those calculated have a hypotenuserelationship, see later.

One further checking step is shown (42), as to whether any cluster hasgreater than maximum loads capacitance. If so, the number of clusterswill be increased (43) and the steps 35–42 repeated, if necessary afterincrease to the number of RTW component circuits (using 33, 34).

Loads data for this clustering routine can be readily determined, if notgiven, e.g. extracted from LEF/DEP format available at www.si2.org, orother open-access databases using automated script.

Completion of clustering can be a convenient stage at which to determinedummy or padding capacitances to even up capacitance round the signalpaths of the RTW component circuits, and doing so may be effectively afinal step (44) in the described clustering routine. This can andusually would be a first assessment to be followed by re-assessmentlater, say balancing for slight errors found in Spice-typeindustry-standard simulation. Adding capacitance to endless signal pathsof a transmission line nature in an orderly way can compensate fornon-uniformity introduced by connects to loads (which will beconcentrated onto) less than about 25% of the total signal path lengthfor the one-phase fully synchronous paradigm), and contribute as far ascan be to the ideal of signal path parts exhibiting the same graduallychanging impedance especially between as well as through junctions. Itfollows that RTW component circuits would actually benefit frommulti-phase timing signal measurements or even fill phase-grading, assuch could lend itself to more even loading of the signal paththroughout its length Indeed, it is noteworthy that full phase-grading(or flow as it may be called) would also reduce topology constraints onRTW component circuits and arrays, and actually simply CAD design mainlyto looking for the Kirchoff-type junction conditions to be met.

When clustering is complete (44), routing is determined for actualconnects of the loads as they will be set out in the IC concerned Asuitable inventive routine hereof takes advantageous account offunctional circuitry to be timed or clocked generally having skewtolerance, which translates into twice the percentage of the signal pathlength of the RTW component circuits, which, for typical skew toleranceof at least 10% conveniently translates to most of one side ofsubstantially rectangular such signal paths being available for makingload connects, even for quite highly asymmetric rectangular such signalpaths.

A suitable and practically advantageous signal paths geometry is shownin FIG. 11, for convenience superimposed on the functional logic blocksof FIG. 8 and 9. This geometry is basically substantially rectangularfor signal paths as shown complete only for two column-adjacent paths45A and 4513, see later for more on this geometry and its full arealcoverage with active sides-sharing RTW signal paths, rather thancorner-only sharing that leads to “virtual” servicing of a substantialpart of the area serviced.

Reverting to the routing routine, the positions of the signal paths thatare available for connects to loads at any particular phase or phasesare known. For the particular contiguous asymmetric-rectangle arraygeometry shown with arrows for rotary signal flow, and for an ICfollowing the one-phase synchronous paradigm, but with skew tolerancetaken into account, these positions are at alternate row-following pairsof conductor traces, typically along a major central portion of thelength of a longer side of the signal path they contribute to defining.This gives full pitch information for those portions so available forload connects at whatever particular phase, including as to length forany particular skew tolerance. If the grid array is pre-located, say byits relation to the area to be serviced with timing signals, the X-Ycoordinates of these available signal path lengths follow, includingrelative to skew tolerance in the making of load connects. The routingroutine could then simply be based on a first algorithmic step thatfinds the available connect length nearest to the also known centroid ofeach cluster, and make a single connect accordingly (as 16A, B in FIG.9), then typically with best registration to the exact nominal phaseinvolved. Preferably, however, a second step looks for making loadconnects to the identified available signal path portion that ignore thecentroid, and can make direct individual connects that take account ofskew tolerance for each load of that cluster, say at least for the loadspresenting larger capacitance. Orthogonal row/column parallel routing isindicated in FIG. 12, which shows a mix of direct load connects 55A, 56Aand an effectively star-wired small network (55B) of connects as mightbe dictated by low load skew tolerances. The only constraint required isthat no load connect, if longer than the sum of calculated portions 15and 16 in FIG. 9, should exceed the known maximum for avoiding ringingreflections. Many will be shorter than the sum of the relevantcalculated portions 15 and 16, and any that are longer should not exceedthe margin referred to above, at least if the hypotenuse relation isused.

If there is scope for adjustment of the array of RTW component circuits,say to avoid coincidence or undue proximity to any logic signal lines(12 in FIG. 8) or destinations (13 in FIG. 8), that may be done withinthis routing stage, say immediately before or in conjunction withfinding nearest centroid available signal path portions. An alternativeor additional resource would be to exercise any latitude as to exactpositions of the IC's functional blocks (11 in FIG. 8). This routingroutine can readily extend to, or simply be used alongside othersoftware [for,] the layout of the lines 12 for signals in and out of thefunctional blocks 11.

Given that the importance of inductance cannot be over-stated at veryhigh-clock speeds and very small feature sizes, including the hazards ofcross-talk noise between RTW signal paths and signal lines to and fromfunctional blocks, routing is advantageously followed by investigationof inductance. Whilst inductance extraction can be done using simulationsoftware such as the well-known FastHenry, those tend to be rather slow,and it is preferred herein to use another inventive calculation routinehereof that can be up to about 15% less accurate, but is much faster.

Turning to FIG. 13, this inductance extraction routine involvesselecting (61) a rectangular region about the RTW line and other wiresof immediate interest, and imposing (62) a grid as large and as fine asdesired accuracy requires. The RTW line is decomposed (63A) within theimposed grid into straight-line segments, and the other wiresrepresented (63B) as weighted idealised no-thickness lines in parallel.For each grid point, and each parallel line, the mutual inductance perunit length is calculated (64) on a typical thin wire basis in the X andY direction, specifically using the integral functionInductance/unit length={μ/(4pi)}Int1_(G),1_(W) d ₁)/sqrt(X ² +Y ²)where the integral is along the RTW line signals

X and Y are distance from the line segment to grid point

1 _(G) and 1 _(W) are the unit direction vectors of the grid element andthe RTW line, respectively.

Mutual inductance along the other wire is then obtained (65A–C) bystraightline segmenting the other wires (65A), integrating (6513) theother wire segment unit direction vector and the grid position unitlength inductance along the other wire 15 through the grid area, andsumming (65C) for the mutual inductance of the other wire. This routinecan end with a step identifying undue mutual inductances and instigatingadjustment(s), feasibly automatically indicate specific viableadjustment(s).

The calculated lay-out is then subjected to another inventive routinehereof for first design verification by calculation much faster thanindustry-standard simulations. Indeed the purpose of this routine is toget a faster first result than using Spice type simulation,advantageously as a precursor with useful corrective potential beforesuch industry-standard simulation The innovative nature of the routinearises from its basis simply in rules for impedance matching at eachjunction between conductive traces involved in the array of RTWcomponent circuits, and in their endless signal path travel timesneeding to be an integer multiple of the desired operating frequencyperiod.

FIG. 13 shows a specific such routine staring from creation (71) of adatabase comprising “nodes” representing said junctions and “paths”representing interconnects of the nodes, with paths sharing a mutualinductance paired together to represent the dual trace transmission-linerotary signal path structure of the RTW component circuits arrayedtogether in FIG. 11. Data for each node will include its location andassociated tiling signal phase, and data for each path will include thedirection of signal travel along it, at least its associated capacitanceand inductance, advantageously further its relevant mutual capacitanceand inductance to another path.

Perhaps somewhat artificially, the data-base (71) is shown supplyingnode data (71A) and path data (71B) separately to step sequences of theroutine, one (7275) correcting for time delays, the other (76–81)correcting for impedance mismatches, both of which can be doneindependently of the other.

The data-base (71) also contains data manipulation functions for gettingmatches by changing values of one or other of two items concerned. Thetime delay correction sequence is shown comprising calculating (72)signal transit times for the paths, path-by-path comparison (73) ofthose time delays with the timing signal phases for the nodesinter-connected by the path concerned, followed by adjusting (74) itscapacitance and inductance to correct any mis-match (preferably withoutchanging impedance), and updating the data-base; and repeating (75)steps 73 and 74 15 until all paths have been processed. The impedancecorrection sequence is shown being enabled (76) after all time delayshave been processed (71–75), and proceeding by calculating (77) the pathimpedances (taking account of any changes from time delay correction),node-by-node calculation of total impedances of input and output linesof each node, storing non-zero difference results along with the nodelocation or other identification, separating positive and negativeimpedance differences and grouping them (78) so that those that areequal and opposites are pared, and others each further associated or“paired” to more for cancelling out. Then, for each association of apositive and a negative impedance difference, this routine finds (79A) aroute (preferably the shortest) along the paths between the nodesconcerned and increasing (79B) the impedances of the paths of the routeconcerned by the difference or partial difference concerned, whilekeeping the time delay constant. When all paired and plurally associated“pairings” have been processed, the adjusted RTW component circuitsarray is ready for industry-standard Spice-type simulation.

FIG. 15 shows outline of a sixteen-way parallel processor 80 arisingfrom perceived advantageous speeding up of Spice simulation processinghereof for VLSI ICs, such as microprocessors. Sixteen computing units 81and an overall controller/scheduler 82 for parallel processing haveethernet interconnection to which FIGS. 15C and 15D relate. FIG. 15Ashows one computing unit 81 comprising a motherboard 83 carrying amicroprocessor 84, pre-loaded Spice-based program 85, RAM 86, andethernet connection 87. FIG. 15B shows the controller/scheduler 82 ascomprising ethernet connection 88, four hub units H1–H4, and a masterhub and server unit 90. FIG. 15C shows ethernet connections between thecomputing units 81 and to the hubs H1–H4 (according to the numerals inthe computing unit boxes). FIG. 15D is a diagrammtic indication usingdouble-headed arrows for node-sharing interconnections between fourcomputing units clustered to handle simulation of one section of an RTWcomponent circuit

In this parallel Spice-type processing, the RTW array for simulation issectioned for each computing unit 81 to deal with a different section,and simulated voltage values to go directly between versions of Spice ineach computing unit, the linkages concerned emulating the real linkagesin the RTW array structure.

The alternatives of computing units 81 being connected together directlyor via a hub speeds up data transfer between time steps, especially whentwo computing units share an RTW circuit node.

Spice-based transient analysis is done in time steps and the parallelprocessor hereof involves transfer of RTW node voltages between twocomputing units (FIGS. 15C and 15D), the received voltage being used tocalculate current source strength for the shared node for the next timestep. The current source strength is the ratio of the difference (V1–V2)between the voltages at the two computing units concerned and theresistance of the (virtual) link between them, which should be low toenhance to node coupling. Damping is then applied to the current sourcestrength as an exponential function to combat current surges. It wasfound that this was more stable and tractable than modelling as voltagesources.

Controlling the time step size centrally (62) enhances performance andaccuracy, particularly keeping constant across the cluster. Setting thetime step to the largest acceptable value satisfies the error toleranceconstraints of each node, and all of the computing units locksatisfactorily to the same simulation time.

A suitable software interface emulates interfacing to a single computer,so the actual parallel processing does not affect Spice simulationresults. Spice simulations are readily available to the user of the CADsoftware hereof at all points of the simulated system, thus allowingdirect access to simulated frequency, voltage anywhere, current flow,etc; thus deriving of y data from sequential nominal same-phase pointson waveforms, rise/fall times, rotation direction, etc; direct controlof sizing any transistor, inductance component values, take-off loading,also padding capacitance value and location, Spice time-step, etc; andeffective control of frequency according to global scaling ofinterconnect inductance, investigation of rise/fall by segmenting morefinely, etc.

A useful interactive protocol for this Spice simulation processingcomprises

-   1. First having accuracy low but simulation speed high by way of    setting a coarse time step, say 5% of the projected tiring signal    cycle.-   2. Consequently quickly reaching an initial stable Spice result    determined by checking for squareness of waves throughout.-   3. Checking rotation directions and if rise/fall times have    acceptable values for expected operation, otherwise continue Spice    processing for longer.-   4. Increase accuracy to medium to get results more truly    representative of the RTW array.-   5. Record simulated operating frequency after a few cycles and    reduce or increase preloaded dummy capacitance all round the RTW    array if too low or too high, respectively and repeat until    satisfactory.-   6. Examine rise/fall times and waveshape quality, including    voltage/current ratio (Z) everywhere to locate worst impedance    problems and make corrective local inductance/capacitance    adjustments—and iterate until improvement deemed satisfactory or no    more achievable.-   7. Apply extremes of tunability to check such as switched capacitor    and varactor effectiveness.-   8. Run worst combination of process variables, temperature, voltage    and check if specification still met—if not, consider redesign for    such as more area for tuning components.

Having outlined application of arrayed RTW component circuits todistributed generation and supplying of timing signals to ICs, anddescribed and illustrated individually innovative routines useful forCAD design and layout of such RTW array, more general CAD aspects hereofare now reviewed.

FIG. 16 shows a typical overall design procedure hereof that includespredictive calculation (91) and corrective calculation (101) with thelatter iterating essentially the same sequence of first simulation (92,102), layout (93, 103), extraction (94, 104), and second simulation (95,105).

FIG. 17 shows translation of FIG. 8 into a program flow chart anddiagram demonstrating the basic pattern of assembling anapplication-specific data-base (96) from which calculation (107) phasesalways precede simulation phases (108) with accompanyinglayout/extraction (109) and iteration from predictive to correctivecalculation, feasibly further recalculation iteration(s) untilsatisfactory, and calculation indicated as being in a context includingtaking account of inductance and optimising against power consumption.

Viability of the inter-active CAD software hereof is believed to be wellsupported by its capability to operate satisfactorily using generallyopen-access other software. FIG. 18 is an overview of typical such use.Specifically, the software as developed to date and described thus faris dubbed Rotary Expert (110) and now 20 shown in conjunction with theGemini database (112) from which access is available to such as LEF/DEF(113) and API etc (114), and the recently released Cadance databaseDbView (115), also the well-known Spice, Fast Henry, Fast Cap and otherExtractors (116), and the invaluable Magic (117), all relative to RotaryExpert's graphic user interface 120.

In relation to the inter-active CAD software hereof the graphic userinterface 120 is now described in more detail with reference to FIG. 19and detail FIGS. 19A–19F. One panel 121 of permanently availableselectables is shown at the left-hand side and their selection brings upscreens specific to progress of the CAD software. These screens willvary to some extent according to selection of access shown at the top ofthe screen to correspond with “preliminary” (P), “intermediate” (I),“advanced” (A) and “guru” (G). Broadly, though, the screens have acommon but highly flexible format with up to four sections of displays,typically one area (122) that is often occupied by a view of the RTWcomponent circuits array as processing progresses or displays fromDbView, two other areas 123, 124 either (and usually at least one)specific to particular ones of the selectables 121 or (and usually notmore than one) for importing from what is normally in another screen,and a fourth area (125) that can be specific to the current screen orallow importation from other screens or serve for functions on a“tear-off” basis that can be from any screen or from a repertoirethereof that can include options not considered specific to anyparticular screen or screens. The sizes of these areas will vary, or canbe varied, to suit the screen involved and/or the user's preferences.

FIGS. 19A–C show a norm for the “specify” screen, typically at the areas125, 124 and 123, respectively. Top left will thus be (FIG. 19A) for theset or target operating frequency at 126, including capability to set amaximum/minimum range at 126A, B; also for showing the phase spread(127) as a % representing skew tolerance, and setting (128) therise/fall time of the desired timing signals whether such as byquick/faster/fastest or as a figure of merit typically in picoseconds.Bottom left will be (FIG. 19B) for total capacitance of all loadsinvolved (129) and for capacitance per unit area (129A). Bottom leftwill be (FIG. 19C) for intended or target technology as to feature size(131), logic operating voltage (132), and in metallisation layerthickness 20 (133) and type (134); together with foundry selectables(135) for which stored data will be pulled out into the display, whetherfor interconnects (136) or for transistors (137).

FIG. 19D is relevant to the “constrain” screen, specifically to settingand displaying parameters of the conductive traces of thetransmission-line endless rotary signal paths of the RTW componentcircuits, see as to minimum width 138, maximum overall width 139, andproportion (141) of the metallisation layer to be available to theclock. As a norm, this can be the only content of the constrain screen;but more may be imported as desired by the user.

FIG. 19E is relevant to the “solve” screen, and will normally be atbottom 30 left (124) with the array display occupying the majority ofthe screen from the right hand side, say all of areas 122 and 123. Thisincludes width (142) and spacing (143) of the transmission line traces,also inductance (144L), capacitance (144C) and resistance (144R) perunit length This solve screen is likely to have the most “tearoff”items, often including call-ups for detail concerning loads, connectlines, array geometry, waveform preview, skew tolerance, etc; andinclude capability to look at and adjust at least its projected powerconsumption as well as adjusting the transmission line traces. This is,of course, all in the interests of user inter-action in moving from afailing or poor projected layout to a viable or better one, and the“solve” screen is intended mainly for expert users.

The “simulate” screen will have a choice between waveform and a normallymouse-operated user-probable representation of the RTW array so that itcan be inspected for waveform at any array or individual circuitposition, and will usually carry analysed waveform data concerningfrequency, skew, jitter etc.

FIG. 19F relates to the “worst case” screen, and can show detail of thetransmission line trace connections, and/or of the transmission lines oneither layer of metallisation, etc, including with magnificationcapabilities to aid assessment in what is adjudged to be the worst casefor any part of the RTW array and/or its context of operation

The “circuits” screen is also intended for expert users, and will show acircuit diagram for the regenerative back-to-back diode circuitrycross-connecting the transmission line traces as specifically taught andshown in the above UK patent, including equivalent trace inductance,capacitance and resistance elements complete with parameter values andcapabilities for further investigation of capacitance parameters,intrinsic gate resistance, drain inductance parasitics, supplyparasitics, decoupling, varactors etc; and call-up for such as FastHenryanalysis.

The CAD-related teaching hereof also extends inventively to measuresfurther aiding reviewing and specifying detail of the regenerativecross-connection circuitry indicated in FIGS. 11 and 12 as blocks 141,though without showing their related via connections to the dualconductor traces concerned. The parameter-indicating back-to-backinverter circuit diagram mentioned in relation to the “solve” screen andshown at 131 in FIG. 20 is further useful in the innovative detailreview and adjustment now being described. Indeed, this cross-connectioncircuitry is preferably of a highly configurable nature, see FIG. 21,not only as to its inverters 142, but also as to affording but also asto configuration of associated pass transistors that will usuallycomprise both P- and N-types; and /or further for such as varactors 143capable of fine timing signal operating frequency adjustment of up toabout +/−10%, and/or of capacitors 144 capable of medium such frequencyadjustment of up to about +/−25% and/or maybe even frequency dividers(not shown) for coarse frequency adjustment All of these configurableinverter, pass transistor, varactor and capacitor provisions 142–145 areindicated diagrammatically as of three-stage type, see dashed dividers,and further as said three stages being of a binary weighted nature, see10 one- two- and four-times width spacing of the dashed dividers, andcontrol lines thereto from bus 146. It will be appreciated that binarysignals from the bus 146 onto the control lines can specify maxima up toseven times minima for standard binary weightings (though not require orlimited thereto). The provisions 142–145 and the action of the controllines could be by bringing the related stage into operative effect 15 orby disabling it from operative effect.

This configurability can also be applied to padding capacitancecapabilities that may then be in-built at least once per side path partof every endless RTW side path part, see 155A, B, C in FIG. 22 showingdetail.

It is to be appreciated that in-built configuration capabilities of suchregenerative cross-connection circuitry 141 and/or padding capacitance151 afford very considerable adjustment capability to RTW timing signalarray designers, including for automated software driven adjustment, butare also seen as having hardware aspects of invention.

Another feature with which the CAD provisions hereof can handle veryreadily indeed, perhaps particularly using the design verificationroutines already discussed at length, and further having specifichardware aspects of invention, is any requirement or desire for parts oflayers carrying the dual-conductors of the endless RTW signal paths tobe left free, whether for other usage or as being pointless ifregistering with large-area functional logic such as 64-bit registers ormemory etc, see at 161 in FIG. 23. As should be apparent from thisFigure, all that is required is to ensure that the boundingdual-conductor parts of the endless RTW signal paths obey the impedancerequirements for their junctions and do not disturb the re-circulatorytransit time requirements. Resulting different impedance-matching widthsof the dual-conductors are apparent.

FIG. 23 also indicates highly beneficial bounding of the whole arrayalso 5 with impedance-matching dual-conductor parts.

Turning to inventive aspects of the RTW component circuit formation andarraying geometry as used in FIGS. 11, 12 and 23, same affords endlesselectromagnetically continuous signal paths of dual-conductortransmission-line nature with a signal inversion by way of aMoebius-twist type cross-over; and does so with particular merit forimplementation using two layers of metallisation, as for semiconductorintegrated circuits or double-sided or multilayer printed circuitboards. In such context, and in general terms as another inventiveaspect hereof, a nonintersecting plurality of dual conductors that crossanother non-intersecting plurality of dual conductors with electricallyinsulating material between them has selective interconnections throughthe insulating material at crossing positions of the dual conductors ofthe two pluralities thereof, which selective interconnections are eachone-to-one as between for the dual conductors of one said plurality andthe dual conductors of the other said plurality, and, for crossingpositions between which the dual conductors of one and the other saidpluralities alternate in bounding at least one included area, theone-to-one interconnections are different from the others at one of thecrossing positions associated with the or each said included area.

For the or each said included area, one of the dual conductors of eachsaid plurality will be inner and the other outer (of the included areaconcerned), and there will be two different types of one-to-oneinterconnections, namely between inners and outers of both pluralitiesor between inner of one plurality and outer of the other.

A single inners-to-outers interconnection has the Moebius-twist effect(considering the dual conductors as edges of a strip). Two or any evennumber of such inter-connections negates the Moebius-twist, but any oddnumber preserves it.

For the alternation of dual conductors from one and the other saidplurality about said included area, there will be four said crossingpositions, for three of which the one-tone connection will be the samebut different from the fourth. Any of the endless signal paths can sharepart of its path with part of another endless signal path so long as thesignal rotations in each path have the same direction in the sharedparts, so long as the equal power-in/power-out and related impedanceimplications are met for junctions at ends of the parts.

Where the dual conductors of each said plurality are parallel with thoseof one said plurality orthogonal to those of the other said plurality,as in rows and columns relationship, the included area will berectangular with said interconnections at its four corners.

For rectangular RTW signal paths, i.e. with four corners available forthe interconnections, the three-the-same-but-one-different requirementfor the interconnections, and the preservation of the Moebius-twisteffect by one or three inners-to-outers interconnections, combine toallow every included rectangle of a configuration of the pluralities ofdual conductors in rows and columns to be an active RTW circuit.

FIG. 24 shows this by way of double-headed arrows indicating both ofinners-to-outers interconnections and the included area of the signalpath and RTW component circuit for which it has the Moebius-twisteffect, but not for the row- or chain-adjacent RTW circuit.

The pitches between dual conductors of each plurality will determine theaspect ratio of the or each bounded rectangular area, thus the length ofits boundary, which can be useful in relation to frequency requirementsand numbers of RTW component circuits, and further useful in eitherhaving more of the dual conductors in one “layer” than in the other (ascould suit for one IC metallisation layer being thicker than another) orin relation to the area occupied by the RTW component circuits (as couldsuit leaving one “layer” with less such occupancy).

Also as shown, it is noteworthy that vias for making the requiredinterconnections exhibit pattern repetitions that are different foralternating rows 46 and alternating columns 47 in achieving theMoebius-twist effect for inner and outer conductive trace portions abouteach of the signal paths 45 with their via interconnections making asingle continuous doubly circumscribing conductive trace.

These patterns of via pair connections in alternating rows are all thesame in one, and successively opposite in the other; and same applies toalternating columns.

1. A method to generate a design for timing circuitry that provides atraveling-wave type timing signal waveform along a path of transmissionline nature, comprising: determining a plurality of regions of thetiming circuitry, each region being sized such that a signal delay,along the path, between adjoining ones of such regions is below aparticular fraction of a target operating frequency; generating a designfor each of the determined regions of the timing circuitry such that,for each region individually, that region nominally has particulardesired characteristics, the designs for the determined regionsconstituting an entire design; simulating operation of the entiredesign; selectively adjusting the design for at least some of theregions based on a result of the simulating step; and repeating thesteps of simulating and selectively adjusting as appropriate until theresult of the simulating step is a desired result.
 2. The method ofclaim 1, wherein the step of determining a plurality of regionsincludes: for each region, dividing perimeters of each region into anumber of segments; approximating lumped transmission line LCR for atleast some of the segments; and determining relevant parameters suchthat time delays over each segment have a particular relationship to afunction of the target frequency and the number of segments for theperimeter of that region.
 3. The method of claim 2, wherein the functionof the target frequency and the number of segments for the perimeter ofthat region is the target frequency divided by twice the number ofsegments.
 4. The method of claim 3, wherein the particular relationshipis substantial equality.
 5. The method of claim 1, wherein the timingcircuitry includes regenerative means distributed along the path tocontrol voltage transitions in the timing signal waveform; and whereinstep the step of determining a plurality of regions includes, for eachregion, dividing perimeters of each region into segments; anddetermining a lumped capacitance of each segment to be substantiallyequal to a worst case load capacitance plus loop-to-loop interconnectcapacitance plus active delay capacitance of the regenerative means. 6.The method of claim 2, wherein the step of determining a plurality ofregions further includes, for unloaded segments, determining a paddingcapacitance to substantially match the capacitance of the lumped linecapacitance.